Identifying Data Noises, User Biases, and System Errors in Geo-tagged Twitter Messages (Tweets)

نویسندگان

  • Ming-Hsiang Tsou
  • Hao Zhang
  • Chin-Te Jung
چکیده

Many social media researchers and data scientists collected geotagged tweets to conduct spatial analysis or identify spatiotemporal patterns of filtered messages for specific topics or events. This paper provides a systematic view to illustrate the characteristics (data noises, user biases, and system errors) of geo-tagged tweets from the Twitter Streaming API. First, we found that a small percentage (1%) of active Twitter users can create a large portion (16%) of geo-tagged tweets. Second, there is a significant amount (57.3%) of geo-tagged tweets located outside the Twitter Streaming API’s bounding box in San Diego. Third, we can detect spam, bot, cyborg tweets (data noises) by examining the “source” metadata field. The portion of data noises in geo-tagged tweets is significant (29.42% in San Diego, CA and 53.47% in Columbus, OH) in our case study. Finally, the majority of geo-tagged tweets are not created by the generic Twitter apps in Android or iPhone devices, but by other platforms, such as Instagram and Foursquare. We recommend a multi-step procedure to remove these noises for the future research projects utilizing geo-tagged tweets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scaling laws in geo-located Twitter data

We observe and report on a systematic relationship between population density and Twitter use. Number of tweets, number of users and population per unit area are related by power laws, with exponents greater than one, that are consistent with each other and across a range of spatial scales. This implies that population density can accurately predict Twitter activity. Furthermore this trend can ...

متن کامل

Detecting Emergency Events and Geo-Location Awareness from Twitter Streams

the rapidly increasing number of messages on twitter is quite interesting. Through twitter streaming, this paper is capable of delivering tweets for any keywords from clients all around the world or Hashtag in real-time. However, semantic topic extraction and tracking the userinterested news events from messages on twitter can be considered as a challenging task. In this paper focused on detect...

متن کامل

Deriving retail centre locations and catchments from geo-tagged Twitter data

Article history: Received 13 January 2016 Received in revised form 20 August 2016 Accepted 28 September 2016 Available online 20 October 2016 This investigation offers an initial foray into the application of geo-tagged Twitter data for generating insights within two areas of retail geography: establishing retail centre locations and defining catchment areas. Retail related Tweets were identifi...

متن کامل

Visualizing User-Defined, Discriminative Geo-Temporal Twitter Activity

We present a system that visualizes geo-temporal Twitter activity. The distinguishing features our system offers include, (i) a large degree of user freedom in specifying the subset of data to visualize and (ii) a focus on discriminative patterns rather than high volume patterns. Tweets with precise GPS co-ordinates are assigned to geographical cells and grouped by (i) tweet language, (ii) twee...

متن کامل

Discover Patterns and Mobility of Twitter Users - A Study of Four US College Cities

Geo-tagged tweets provide useful implications for studies in human geography, urban science, location-based services, targeted advertising, and social network. This research aims to discover the patterns and mobility of Twitter users by analyzing the spatial and temporal dynamics in their tweets. Geo-tagged tweets are collected over a period of six months for four US Midwestern college cities: ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1712.02433  شماره 

صفحات  -

تاریخ انتشار 2017